Dimension Reduction Forests: Local Variable Importance Using Structured Random Forests

نویسندگان

چکیده

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random only provide in a global sense. There is an increasing need for such assessments at local level, motivated by applications personalized medicine, policy-making, bioinformatics. We propose new nonparametric estimator that pairs flexible forest kernel with sufficient dimension reduction adapt regression function’s structure. This allows us estimate meaningful directional measure each prediction point. develop computationally efficient fitting procedure conditions recovery splitting directions. demonstrate significant gains our proposed over competing on simulated real problems. Finally, we apply method seasonal particulate matter concentration data collected Beijing, China, which yields measures. The presented here available drforest Python package. Supplementary materials this article online.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Variable selection using random forests

This paper proposes, focusing on random forests, the increasingly used statistical method for classification and regression problems introduced by Leo Breiman in 2001, to investigate two classical issues of variable selection. The first one is to find important variables for interpretation and the second one is more restrictive and try to design a good prediction model. The main contribution is...

متن کامل

Variable Selection Using Random Forests

One of the main topic in the development of predictive models is the identification of variables which are predictors of a given outcome. Automated model selection methods, such as backward or forward stepwise regression, are classical solutions to this problem, but are generally based on strong assumptions about the functional form of the model or the distribution of residuals. In this paper a...

متن کامل

Using Random Forests in the Structured Language Model

In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition...

متن کامل

VSURF: An R Package for Variable Selection Using Random Forests

This paper describes the R package VSURF. Based on random forests, and for both regression and classification problems, it returns two subsets of variables. The first is a subset of important variables including some redundancy which can be relevant for interpretation, and the second one is a smaller subset corresponding to a model trying to avoid redundancy focusing more closely on the predict...

متن کامل

Variable Selection in Time Series Forecasting Using Random Forests

Time series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to sugg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Computational and Graphical Statistics

سال: 2022

ISSN: ['1061-8600', '1537-2715']

DOI: https://doi.org/10.1080/10618600.2022.2069777